metagenomic dataset
GFSNetwork: Differentiable Feature Selection via Gumbel-Sigmoid Relaxation
Wydmański, Witold, Śmieja, Marek
Feature selection in deep learning remains a critical challenge, particularly for high-dimensional tabular data where interpretability and computational efficiency are paramount. We present GFSNetwork, a novel neural architecture that performs differentiable feature selection through temperature-controlled Gumbel-Sigmoid sampling. Unlike traditional methods, where the user has to define the requested number of features, GFSNetwork selects it automatically during an end-to-end process. Moreover, GFSNetwork maintains constant computational overhead regardless of the number of input features. We evaluate GFSNetwork on a series of classification and regression benchmarks, where it consistently outperforms recent methods including DeepLasso, attention maps, as well as traditional feature selectors, while using significantly fewer features. Furthermore, we validate our approach on real-world metagenomic datasets, demonstrating its effectiveness in high-dimensional biological data. Concluding, our method provides a scalable solution that bridges the gap between neural network flexibility and traditional feature selection interpretability. We share our python implementation of GFSNetwork at https://github.com/wwydmanski/GFSNetwork, as well as a PyPi package (gfs_network).
Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Ensemble Feature Selection Methods
Hacilar, Hilal, Nalbantoglu, O. Ufuk, Aran, Oya, Bakir-Gungor, Burcu
The tremendous boost in the next generation sequencing and in the omics technologies makes it possible to characterize human gut microbiome (the collective genomes of the microbial community that reside in our gastrointestinal tract). While some of these microorganisms are considered as essential regulators of our immune system, some others can cause several diseases such as Inflammatory Bowel Diseases (IBD), diabetes, and cancer. IBD, is a gut related disorder where the deviations from the healthy gut microbiome are considered to be associated with IBD. Although existing studies attempt to unveal the composition of the gut microbiome in relation to IBD diseases, a comprehensive picture is far from being complete. Due to the complexity of metagenomic studies, the applications of the state of the art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, i) to generate a classification model that aids IBD diagnosis, ii) to discover IBD associated biomarkers, iii) to find subgroups of IBD patients using k means and hierarchical clustering. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR) and Extreme Gradient Boosting (XGBoost). In our experiments with 10 fold cross validation, XGBoost had a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to the single classifiers, ensemble methods such as kNN and logitboost resulted in better performance measures for the classification of IBD.
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
- Asia > Middle East > Republic of Türkiye > Kayseri Province > Kayseri (0.04)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Hepatology (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.35)
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers.